This is a data of over 113,000 borrowers that inquired for loans with Prosper. This will look into some variables that may affect borrowers’ APR or Prosper grade. We will use 12 of the 81 variables in this analysis
## [1] 113937 13
## 'data.frame': 113937 obs. of 13 variables:
## $ Term : Ord.factor w/ 3 levels "12"<"36"<"60": 2 2 2 2 2 3 2 2 2 2 ...
## $ LoanStatus : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
## $ BorrowerAPR : num 0.165 0.12 0.283 0.125 0.246 ...
## $ ProsperRating..Alpha.: Ord.factor w/ 8 levels ""<"HR"<" E"<"D"<..: 1 7 1 7 4 6 NA 5 8 8 ...
## $ BorrowerState : Factor w/ 52 levels "","AK","AL","AR",..: 7 7 12 12 25 34 18 6 16 16 ...
## $ IsBorrowerHomeowner : logi TRUE FALSE FALSE TRUE TRUE TRUE ...
## $ CreditScoreRangeLower: int 640 680 480 800 680 740 680 700 820 820 ...
## $ BankcardUtilization : num 0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
## $ DebtToIncomeRatio : num 0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
## $ StatedMonthlyIncome : num 3083 6125 2083 2875 9583 ...
## $ LoanOriginalAmount : int 9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
## $ MonthlyLoanPayment : num 330 319 123 321 564 ...
## $ AnnualIncome : num 37000 73500 25000 34500 115000 ...
## Term LoanStatus BorrowerAPR
## 12: 1614 Current :56576 Min. :0.00653
## 36:87778 Completed :38074 1st Qu.:0.15629
## 60:24545 Chargedoff :11992 Median :0.20976
## Defaulted : 5018 Mean :0.21883
## Past Due (1-15 days) : 806 3rd Qu.:0.28381
## Past Due (31-60 days): 363 Max. :0.51229
## (Other) : 1108 NA's :25
## ProsperRating..Alpha. BorrowerState IsBorrowerHomeowner
## :29084 CA :14717 Mode :logical
## C :18345 TX : 6842 FALSE:56459
## B :15581 NY : 6729 TRUE :57478
## A :14551 FL : 6720
## D :14274 IL : 5921
## (Other):12307 : 5515
## NA's : 9795 (Other):67493
## CreditScoreRangeLower BankcardUtilization DebtToIncomeRatio
## Min. : 0.0 Min. :0.000 Min. : 0.000
## 1st Qu.:660.0 1st Qu.:0.310 1st Qu.: 0.140
## Median :680.0 Median :0.600 Median : 0.220
## Mean :685.6 Mean :0.561 Mean : 0.276
## 3rd Qu.:720.0 3rd Qu.:0.840 3rd Qu.: 0.320
## Max. :880.0 Max. :5.950 Max. :10.010
## NA's :591 NA's :7604 NA's :8554
## StatedMonthlyIncome LoanOriginalAmount MonthlyLoanPayment
## Min. : 0 Min. : 1000 Min. : 0.0
## 1st Qu.: 3200 1st Qu.: 4000 1st Qu.: 131.6
## Median : 4667 Median : 6500 Median : 217.7
## Mean : 5608 Mean : 8337 Mean : 272.5
## 3rd Qu.: 6825 3rd Qu.:12000 3rd Qu.: 371.6
## Max. :1750003 Max. :35000 Max. :2251.5
##
## AnnualIncome
## Min. : 0
## 1st Qu.: 38404
## Median : 56000
## Mean : 67296
## 3rd Qu.: 81900
## Max. :21000035
##
From looking at the statisical data, we have 113,937 observations over 12 variables.
One thing to note is that DebtToIncomeRatio (1 should be highest), StatedMonthlyIncome, and MonthlyLoanPayment have outliers.
From looking at the terms, it seems that 36 month is the most common terms for loans
Most of the loans are still current or completed. There are a some that are charged off or defaulted, while a small amount are pasted due or in final payment
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00653 0.15629 0.20976 0.21883 0.28381 0.51229 25
It seems that there is a spike APR around 36%. This will be one of the main variables to test what affect this variable. Average APR is about 21%
## HR E D C B A AA NA's
## 29084 6935 0 14274 18345 15581 14551 5372 9795
There are 29,084 out of the 113,937 that are blank. subsetting dataframe to only exclude blanks.From looking at the data, it seems that AA is the highest and HR is the lowest.
Also, most borrowers have a C rating
There are 5,515 out of the 113,937 that are blank. subsetting dataframe to only exclude blanks.Borrowers are mostly from California. Florida, Illinois, New York, and Texas all follow, being a close second. It may be from higher populations from bigger states or more populated cities.
## Mode FALSE TRUE
## logical 56459 57478
There are 57,478 home owners and 56,459 that don’t own a home.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 660.0 680.0 685.6 720.0 880.0 591
##
## 0 360 420 440 460 480 500 520 540 560 580 600
## 133 1 5 36 141 346 554 1593 1474 1357 1125 3602
## 620 640 660 680 700 720 740 760 780 800 820 840
## 4172 12199 16366 16492 15471 12923 9267 6606 4624 2644 1409 567
## 860 880
## 212 27
For the lower credit score, it seems most borrowers have around 660-700
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.310 0.600 0.561 0.840 5.950 7604
Per the variable definitions, Bank ultilaztion is a percentage. anything past 1 is an error.
Visually, more than of borrowers have more than 50% utilization
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
Per the variable definitions, DebtToIncomeRatio is a percentage. anything past 1 is an error.
Moat of the borrowers’ debt to income ratio is 25% to 30%
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 3200 4667 5608 6825 1750003
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 38404 56000 67296 81900 21000035
Used 99% quantile to remove outliers.
Most borrowers monthly income is around $4,500 to $5,000, annual income ranging is around 40,000 to 60,000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1000 4000 6500 8337 12000 35000
Used 99% quantile to remove outliers.
Borrower’s loan usual borrow around 4,000, 10,000, and 15,000. I wonder if the higher amounts are for homeowners?
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 131.6 217.7 272.5 371.6 2251.5
Used 99% quantile to remove outliers.
Most borrowers are paying around $150.00 in their monthly payments
There are 113,937 diamonds in the dataset with 12 features (Term, Loan Status, Borrower APR, Prosper Rating, Borrower State, Borrower Homeownership, Credit Score, Bank Utilization, Debt to Income Ratio, Stated monthly income, Loan Original Amount, and Monthly Loan payment).
The variables prosper rating is a ordered factor variable with the following level.
(worst) ——> (best) Prosper rating: HR, E, D, C, B, A, AA
Other observations:
The main feature in the data set is the Borrower APR and Prosper rating. I would like to determine what features affects the APR. I feel that the prosper grading, as well as other variables, affect borrow rate of interest.
Annual income, and home ownership may have an effect on the Borrower’s APR
I created the Annual income variable
I noticed the bank utilization and debt to income ratios have max values higher then 1.
From reading the variable dictionary, These values are ratio from 0 to 1.
I subsetted the file to exclude the outliers
From looking at the analysis, borrower’s with a higher prosper rating tends to have a lower APR.
From comparing prosper rating by state, I see concentrations in CA, FL, GA, IL, NY, OH, and TX. I’ll look into these variables further
From research, The possible credit score is 300-850. set limits for this range.
Visually, it seems that having a high credit score somewhat affect APR, but there is a huge APR range concentrate between 640 to 725.
When comparing to states, the credits have greater rangings in states with highly populated cities
There are probably other factors affecting this.
visually, it doesn’t seem like bank utilization has a major affect on APR. One thing to note is that regardless of what utilization is, there is a concentration around 36% APR.
From comparing bank utilization to credit score, it seems that borrowers with lower bank utilization tends to have higher credit scores
visually, it doesn’t seem like debt to bank ratio has a major affect on APR. One thing to note is that regardless of what utilization is, there is a concentration around 36% APR
From comparing home ownership to debt to bank, there doesn’t seem to be much of a difference
visually, it doesn’t seem like debt to bank ratio has a major affect on APR. One thing to note is that regardless of what utilization is, there is a concentration around 36% APR
visually, it doesn’t seem like owning a home has a major affect on APR. One thing to note is that regardless if a borrow owns home, there is a concentration around 36% APR
APR correalates to prosper rating and credit score, which looks like it’s batching ranging of credit score
What I found interesting was the comparison of states to the prosper data. I will look in how CA, FL, GA, IL, NY, OH, and TX compare to one another in the multivariate analysis
Between APR and Prosper Grade
Each state follow the similar breakdowns between APR and prosper rating. CA and TX has higher concentration around have less than 10% APR. In IL, there are some HR rating that were able to get better rates.
From looking at the credit scores, there are some HR ratings that got better rate than other.
Otherwise, the relationship looks like borrowers for better credit scores have better APR and prosper rating.
As bank card utilization goes up, you see less AA rating and more HR ratings.
For the rating, most AA are in the 0 to 40 percent range. HR is almost as high as 100%. All the other ratings are around 60%
Interestly, there are more homeowners that have AA ratings.
From comparing variables to Prosper rating and APR, each variable followed a pretty similar relationship from what was explored in the bivariate plots
I found it surprising the homeowners had higher prosper ratings than non-homeowners
## # A tibble: 7 x 3
## BorrowerState n freq
## <fct> <int> <dbl>
## 1 CA 14717 0.294
## 2 TX 6842 0.136
## 3 NY 6729 0.134
## 4 FL 6720 0.134
## 5 IL 5921 0.118
## 6 GA 5008 0.0999
## 7 OH 4197 0.0837
## # A tibble: 7 x 3
## BorrowerState n freq
## <fct> <int> <dbl>
## 1 CA 14717 0.129
## 2 TX 6842 0.0601
## 3 NY 6729 0.0591
## 4 FL 6720 0.0590
## 5 IL 5921 0.0520
## 6 GA 5008 0.0440
## 7 OH 4197 0.0368
What I found interesting about this is the intensity of ratings in different states. it could be that the states with the higher concentration were because some they contain some of America’s biggest cities.
In this set, CA makes up 29% of the 6 states chosen in the analysis and 13% of the population
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 660.0 680.0 685.6 720.0 880.0 591
Though some of the borrowers had perfect credit scores, they still had a high APR interest rating.
Most of the borrowers credit scores ranged from 660 to 720.
## # A tibble: 2 x 4
## # Groups: IsBorrowerHomeowner [2]
## IsBorrowerHomeowner ProsperRating..Alpha. n Percentage
## <lgl> <ord> <int> <dbl>
## 1 TRUE AA 3847 0.0669
## 2 FALSE AA 1525 0.0270
Home owners generally had more AA ratings. It could be that homeownership plays a big part on getting a better rating.
There are 3,847 homeowners in the AA rating, and 1,525 non-homeowners in this rating
From doing the analysis, I was able to find that APR follow pretty closely with the prosper rating.Some of the struggle I had was trying to find the relationship, only to realize that some of the information isn’t complete. For example, I would have loved to look at rating by occupation, but quickly saw that the “Professional” value made up most of the variable. I found that there was alot of missing information.
I was surprised about the impact homeownership have on securing a AA rating and having a decent APR.
I would like to look back at this data set by acquiring a more complete data set that would show a better breakdown of occupation to do a further analysis, as well as retrieving city data in states that has the highest borrowers